Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a custom implementation LocalFileSystem::list_with_offset #7019

Merged
merged 10 commits into from
Feb 8, 2025

Conversation

corwinjoy
Copy link
Contributor

Which issue does this PR close?

Closes #7018.

Rationale for this change

Performance enhancement needed for network file system as described in the PR.

What changes are included in this PR?

Add an implementation LocalFileSystem::list_with_offset so that it can intelligently scan files and drastically reduce the number of statx system calls. Basically, we add a prefilter to pull only what is needed in-place.

Are there any user-facing changes?

None.

@github-actions github-actions bot added the object-store Object Store Interface label Jan 24, 2025
@corwinjoy
Copy link
Contributor Author

The initial work for this PR was done by @dplasto, who reported the issue and provided an initial implementation. Further suggestions were provided by @adamreeve .

@tustvold
Copy link
Contributor

It would appear this is legitimately failing CI

@alamb alamb marked this pull request as draft January 25, 2025 17:13
@alamb
Copy link
Contributor

alamb commented Jan 25, 2025

Thanks @corwinjoy and @tustvold -- marking this as a draft as we work through the test failures

@corwinjoy
Copy link
Contributor Author

corwinjoy commented Jan 25, 2025 via email

@etseidl
Copy link
Contributor

etseidl commented Jan 26, 2025

I'm not sure what is happening with the CI. It seems to be failing formatting. But, I ran 'cargo format' and clippy as specified in Contributing.md so I am not sure what is wrong.

Are you running from the arrow-rs directory or the object_store directory. When I run cargo fmt --all -- --check from the object_store directory I get similar errors to CI.

@corwinjoy
Copy link
Contributor Author

corwinjoy commented Jan 26, 2025

Ah, thanks for the clarification @etseidl! From the guidelines I didn't realize that I had to run these commands in the object-store rather than the arrow-rs directory. Doing that, I do get suggested changes which I have applied. Hopefully this will clear the CI checks.

@alamb
Copy link
Contributor

alamb commented Jan 27, 2025

Ah, thanks for the clarification @etseidl! From the guidelines I didn't realize that I had to run these commands in the object-store rather than the arrow-rs directory. Doing that, I do get suggested changes which I have applied. Hopefully this will clear the CI checks.

The emulator tests appear to be failing: https://github.com/apache/arrow-rs/actions/runs/12970828706/job/36180799417?pr=7019

---- local::tests::test_path_with_offset stdout ----
thread 'local::tests::test_path_with_offset' panicked at src/local.rs:1440:66:
called `Result::unwrap()` on an `Err` value: Generic { store: "LocalFileSystem", source: UnableToCanonicalize { path: "../testing/data/arrow-ipc-file", source: Os { code: 2, kind: NotFound, message: "No such file or directory" } } }
stack backtrace:

…or testing rather than pointing to existing dir.
@corwinjoy
Copy link
Contributor Author

OK. I have changed this test to use a temporary test and files which should make it more portable. We will see what the emulator says about this.

@alamb
Copy link
Contributor

alamb commented Jan 29, 2025

OK. I have changed this test to use a temporary test and files which should make it more portable. We will see what the emulator says about this.

Looks like the tests are good

@alamb
Copy link
Contributor

alamb commented Jan 29, 2025

This PR looks ready for review to me. @corwinjoy shall we mark it as ready for review?

@corwinjoy
Copy link
Contributor Author

@alamb Yes please. Mark it as ready for review.

@alamb alamb marked this pull request as ready for review January 29, 2025 21:53
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @corwinjoy -- I think this code is better than what is on main.

I am somewhat concerned that the listing with offset is going to result in non deterministic results but I don't think this PR makes that any better/worse than what is on main

Thanks again

None => config.root.to_file_path().unwrap(),
};

let walkdir = WalkDir::new(root_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some digging and I don't think WalkDir guarantees anything about the sort order of the directory entries

thus if the OS returns a different order on the next call to list_with_offset this may end up with a different result

However I think the existing default impl of list_with_offset (which just calls list() and takes the offset) has the same problem so I don't think it needs to be fixed in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look to me like you can explicitly set an order via https://docs.rs/walkdir/latest/walkdir/struct.WalkDir.html#method.sort_by

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb I agree with all this and thanks for the helpful comments! I think your concern about not having a deterministic order is a good one and agree this is the existing behavior. I also think we should change this, but propose we do that in a follow-up PR since it does change the behavior and could (potentially) break an assumption somewhere. I think we can just use https://docs.rs/walkdir/latest/walkdir/struct.WalkDir.html#method.sort_by_file_name
which should be fine and I believe matches what the offset filter expects.

.map(|x| String::from(x.filename().unwrap()))
.collect();

// Check result with direct filesystem read
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this depends on the file system returning the same order each time. Again this is probably ok but it seems like it could result in a non deterministic test failure over time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it does not depend on this because later, in line 1485 we sort both result sets before comparing. So the initial sort order that gets returned does not matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that if the underlying directory list was different, then the values after skipping the first 10 (for example) might be different, even if we sorted the values that came out.

But since this doesn't seem to happen in practice I am not going to worry too much about it

Copy link
Contributor

@tustvold tustvold Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not had time to review this in detail but a couple of points that may or may not be relevant:

Or to phrase it either way, we shouldn't need to sort, nor should we rely on data being sorted

For context on why we don't guarantee any ordering, it is because many stores, e.g. S3 Express, don't

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense -- thank you

To be clear I don't think this PR changes anything about the object_store operation in the face of inconsistent ordering between calls (but I was speculating on what would happen if the ordering returned by the underlying systems was different between calls)

object_store/src/local.rs Show resolved Hide resolved
@alamb
Copy link
Contributor

alamb commented Feb 7, 2025

I'll plan to merge this PR in a day or two unless there are any more comments or anyone would like longer to review

@alamb alamb merged commit 5bf4a6c into apache:main Feb 8, 2025
14 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 8, 2025

Thanks again @corwinjoy

phillipleblanc pushed a commit to spiceai/arrow-rs that referenced this pull request Feb 10, 2025
…he#7019)

* Initial change from Daniel.

* Upgrade unit test to be more generic.

* Add comments on why we have filter

* Cleanup unit tests.

* Update object_store/src/local.rs

Co-authored-by: Adam Reeve <[email protected]>

* Add changes suggested by Adam.

* Cleanup match error.

* Apply formatting changes suggested by cargo +stable fmt --all.

* Apply cosmetic changes suggested by clippy.

* Upgrade test_path_with_offset to create temporary directory + files for testing rather than pointing to existing dir.

---------

Co-authored-by: Adam Reeve <[email protected]>
alamb added a commit that referenced this pull request Feb 12, 2025
…e32 (#7074)

* Support converting large dates (i.e. +10999-12-31) from string to Date32

* Fix lint

* Update arrow-cast/src/parse.rs

Co-authored-by: Andrew Lamb <[email protected]>

* fix: issue introduced in #6833 -  less than equal check for scale in decimal conversion (#7070)

* fix <= check for scale in decimal conversion

* Update arrow-cast/src/cast/mod.rs

name change

Co-authored-by: Arttu <[email protected]>

* remove incorrect comment

---------

Co-authored-by: Arttu <[email protected]>

* minor: re-export `OffsetBufferBuilder` in `arrow` crate (#7077)

* Add another decimal cast edge test case (#7078)

* Add another decimal cast edge test case

Before 1019f5b this test would fail, as
the cast produced 1. 0 is an edge case worth explicitly testing for.

* typo/fmt

Co-authored-by: Felipe Oliveira Carvalho <[email protected]>

---------

Co-authored-by: Felipe Oliveira Carvalho <[email protected]>

* Support both 0x01 and 0x02 as type for list of booleans in thrift metadata (#7052)

* Support both 0x01 and 0x02 as type for list of booleans

* Also support 0 for false inside boolean collections

* Use hex notation in tests

* Fix LocalFileSystem with range request that ends beyond end of file (#6751)

* Fix LocalFileSystem with range request that ends beyond end of file

* fix windows

* add comment

* Seek error

* fix seek check

* remove windows flag

* Get file length from file metadata

* Introduce `UnsafeFlag` to manage disabling `ArrayData` validation (#7027)

* Introduce UnsafeFlag to manage disabling validation

* fix docs

* Refactor arrow-ipc: Rename `ArrayReader` to `RecodeBatchDecoder` (#7028)

* Rename `ArrayReader` to `RecordBatchDecoder`

* Remove alias for `self`

* Minor: Update release schedule (#7086)

* Minor: Update release schedule

* realism

* Refactor some decimal-related code and tests (#7062)

* Refactor some decimal-related code and tests in preparation for adding Decimal32 and Decimal64 support

* Fixed symbol

* Apply PR feedback

* Fixed format problem

* Fixed logical merge conflicts

* PR feedback

* Refactor arrow-ipc: Move `create_*_array` methods into `RecordBatchDecoder` (#7029)

* Move `create_primitive_array` into RecordBatchReader

* Move `create_list-array` into RecordBatchReader

* Move `create_dictionay_array` into RecordBatchReader

* Print Parquet BasicTypeInfo id when present (#7094)

* Print Parquet BasicTypeInfo id when present

* Improve print_schema documentation

* tiny cleanup

* Add a custom implementation `LocalFileSystem::list_with_offset`  (#7019)

* Initial change from Daniel.

* Upgrade unit test to be more generic.

* Add comments on why we have filter

* Cleanup unit tests.

* Update object_store/src/local.rs

Co-authored-by: Adam Reeve <[email protected]>

* Add changes suggested by Adam.

* Cleanup match error.

* Apply formatting changes suggested by cargo +stable fmt --all.

* Apply cosmetic changes suggested by clippy.

* Upgrade test_path_with_offset to create temporary directory + files for testing rather than pointing to existing dir.

---------

Co-authored-by: Adam Reeve <[email protected]>

* fix: first none/empty list in `ListArray` panics in `cast_with_options` (#7065)

* fix: first none in `ListArray` panics in `cast_with_options`

* simplify

* fix

* Update arrow-cast/src/cast/list.rs

Co-authored-by: Jeffrey Vo <[email protected]>

---------

Co-authored-by: Jeffrey Vo <[email protected]>

* Benchmarks for Arrow IPC writer (#7090)

* Add benchmarks for Arrow IPC writer

* Add benchmarks for Arrow IPC writer

* reuse target buffer

* rename, etc

* Add compression type

* update

---------

Co-authored-by: Andy Grove <[email protected]>

* Minor: Clarify documentation on `NullBufferBuilder::allocated_size` (#7089)

* Minor: Clarify documentaiton on NullBufferBuilder::allocated_size

* add note about why allocations are 64 bytes

* Add more tests for edge cases

* Add negative test case for incorrectly formatted large dates

---------

Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Himadri Pal <[email protected]>
Co-authored-by: Arttu <[email protected]>
Co-authored-by: Piotr Findeisen <[email protected]>
Co-authored-by: Felipe Oliveira Carvalho <[email protected]>
Co-authored-by: Jörn Horstmann <[email protected]>
Co-authored-by: Kyle Barron <[email protected]>
Co-authored-by: Curt Hagenlocher <[email protected]>
Co-authored-by: Devin Smith <[email protected]>
Co-authored-by: Corwin Joy <[email protected]>
Co-authored-by: Adam Reeve <[email protected]>
Co-authored-by: irenjj <[email protected]>
Co-authored-by: Jeffrey Vo <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LocalFileSystem::list_with_offset is very slow over network file system
4 participants